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Abstract 

In this paper we present a novel method for solving optimization problems governed by 
partial differential equations. Existing methods use gradient information in marching toward 
the minimum, where the constrained PDE is solved once (sometimes only approximately) per 
each optimization step. Such methods can be viewed as a marching techniques on the intersection 
of the state and costate hypersurfaces while improving the residuals of the design equation per 
each iteration. In contrast, the method presented here march on the design hypersurface and 
at each iteration improve the residuals of the state and costate equations. The new method 
is usually much less expensive per iteration step, since in most problems of practical interest 
the design equation involves much fewer unknowns than either the state or costate equations. 
Convergence is shown using energy estimates for the evolution equations governing the iterative 
process. Numerical tests shows that the new method allows the solution of the optimization 
problem in cost equivalent to solving the analysis problem just a few times, independent of the 
number of design parameters. The method can be applied using single grid iterations as well as 
with multigrid solvers. 
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(ICASE), NASA Langley Research Center, Hampton, VA 23681-0001. 
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1 Introduction 


In the last few years there has been a growing interest in the numerical solution of optimization 
problems governed by large scale problems. This new interest is a direct result of the improvement 
in computer technology. Probably the most challenging problems are those which involve complex 
governing equations such as the Euler, Navier- Stokes, Acoustic wave, and Maxwell’s. Some global 
quantities governed by the solutions of such equations are required to be minimized (maximized) 
in terms of some prescribed design variables. The resulting mathematical problem is formulated 
as a constrained optimization problem which can sometimes be viewed as a control problem. Most 
existing algorithms use gradient information for reaching the minimum, possibly together with 
preconditioners for accelerating convergence. 

Efficient gradient calculation can be done using the adjoint equations, and in area of aerody- 
namics design, this approach was first suggested in [J]. There the steepest descent method was 
employed there and the adjoint equations were used for efficient calculation of gradients. In this 
approach, each optimization step requires the solution of the state and costate equations and an 
efficient implementation is achieved by using multigrid methods for both equations. No acceleration 
of the optimization process was involved in this work. 

The one shot method proposed in [Tl] for control problems, also uses the adjoint method to- 
gether with multigrid acceleration for state and costate, but also include an acceleration of the 
minimization process. Its development so far has been for problems with elliptic partial differential 
equations as constraints. The main idea is that smooth perturbations in the data of the prob- 
lem introduce smooth changes in the solution, and highly oscillatory changes in the data produces 
highly oscillatory changes in the solution. Moreover, highly oscillatory changes are localized. These 
observations enable the construction of very efficient optimization procedure, in addition to very 
efficient solvers for the state and costate variables. Design variables that correspond to smooth 
changes in the solution are solved for on coarse levels and those corresponding to highly oscilla- 
tory changes are solved for on appropriate finer grids. The resulting method can be viewed as a 
preconditioning of the gradient descent method where the new condition number is independent 
of the grid size, and is of order 1. Thus, within a few optimization iterations one reaches the 
minimum. The method was first developed for a small dimensional parameter space, where the 
optimization was done on the coarsest grid, yet converging to the fine grid solution [Tl]. Later 
in [TKS1], [TKS2] the method was applied to cases of a moderate number of design variables and 
where the optimization was done on few of the coarsest grids. The extension of these ideas to the 
infinite dimensional parameter space was done in [AT1],[AT2] where both boundary control as well 
as shape design problems were treated. In [AT1],[AT2] an important new analysis for the structure 
of the functional near the minimum was introduced. That analysis also enables the construction 
of efficient relaxation for multigrid methods and preconditioners for single grid techniques. More- 
over, it can give essential information about the structure of the minimum including the condition 
number for the optimization problem, the well-posedness (ill-posedness) of the problem, and can 
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suggest appropriate regularization techniques. Experiments with the one shot method for finite 
dimensional and infinite dimensional design spaces showed that the convergence rate is practically 
independent of the number of design parameters. 

The necessity of using multigrid algorithms in the one shot methods is certainly a disadvantage 
since in many engineering applications the underlying solvers do not use multigrid methods. This 
drawback has led to inquiries in other directions, but still aiming at algorithms that solve the full 
optimization problem in one shot, i.e. , in a cost not much larger that that of solving the analysis 
problem. 

The first observation made was that the solution when using the adjoint method is an intersec- 
tion point of three hypersurfaces describing the state equation, costate-state equation and design 
equation (together forming the necessary conditions of the minimization problem). The adjoint 
method can be viewed as marching on the intersection of the hypersurfaces corresponding to state 
and costate variables, in the direction of the intersection with the design hypersurface. Since in 
most applications the number of design variables is significantly smaller than the number of state 
or costate variables, marching in the design hypersurface is a much less expensive process than the 
adjoint method, and may serve as a solution process for the optimization problem. 

Methods that march on the design hypersurface- are not based on gradients and their convergence 
properties are different. In this paper we construct and analyze methods of this type by embedding 
the necessary conditions into an evolution equation so that the solution evolves in the design 
hypersurface. Energy estimates are used to prove convergence. 

The new methods which are stable approximations to evolution processes can be accelerated us- 
ing multigrid or other acceleration techniques. Numerical results for model problems are presented 
and demonstrate the effectiveness of the method. It is shown that the full optimization problem is 
solved in a computer time equivalent to just a few solutions of the analysis problem. The method 
seems to converge in a rate independent of the number of design variables. 

2 On Adjoint Methods 

We consider the following constrained minimization problem 

min E(u, 4>(u)) (1) 

U 

L(u,<j>) = 0 ( 2 ) 

where L(u , .) is a partial differential operator (possibly nonlinear) defined on a Hilbert space H of 
functions defined on a domain Cl. The design variable is assumed to be in some other Hilbert space 
U , for example, functions defined on the boundary dCl, or part of it. 

The (formal) necessary conditions are 

L(u,(f > ) = 0 State Equation 

L*^X + E<f> = 0 Costate Equation (3) 

L* u \ + E u = 0 Design Equation 


2 



and we assume the existence of solutions for both the state and costate equations. 

It can be shown that the gradient of E(u,<f>(u)) is given by 

A{u) = Ll\(<f>,u) + Eu{u,<f>(u)) (4) 

where 4>(u), u) are the solution of the state and costate equations. The quantity -A(u) can 
serve as a minimization direction (steepest descent). 

The adjoint method consists of solving the state and costate equations at each update step of 
the design variables. Thus is can be viewed as an approximation to the following evolution process. 


L(u , 4>) = 0 

( 5 ) 

4- E^ = 0 

(6) 

ju + L* u \ + E u = 0 

( 7 ) 


where ^ u represent the derivative of u with respect to the pseudo-time variable introduced into 
the problem. The actual iteration method is obtained by replacing ^ u with (u n+1 — u n )/6t , for a 
sufficiently small 6t 

The full algorithm can be viewed as a solver for the equation 

A{u) = 0 (8) 

for the variable u. A crucial quantity to consider for analyzing the efficiency of different algorithms 
is the mapping 

u — ♦ .A(u) (9) 

For problems arising from partial differential equation this mapping is a differential or a pseudo- 
differential operator and bad conditioning is anticipated. Preconditioning of basic iterative methods 
such as the steepest descent, is needed. 

The one shot methods [T1],[TKS1],[AT1],[AT2] were aiming at a preconditioning of the gradient 
algorithm in such a way that an order one condition number is obtained. In such a case the number 
of minimization step required to reach the minimum is independent of the size of the problem, i.e., 
the number of unknowns for u. This approach was found to be very successful for elliptic equations. 
The idea is to exploit the locality of high frequencies in the algorithm, as well as the fact that high 
frequencies in the design variables are related to high frequencies of the state variable and vice 
versa. Finite and infinite dimensional design spaces have been considered with application to 
aerodynamics problems, and other shape design problems. 

Another possible direction, which was not explored, is to construct single grid preconditioners 
based on the form of the symbol of the operator A. This idea will be discussed elsewhere. 
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Figure 1: Hypcrsurfaces for slate, costate and design equations 

3 The New Approach 

The solution of the minimization problem is the intersection point of the hypcrsurfaces defined 
by state, costate, and design equations, see figure 1. Gradient descent methods for constrained 
optimization problems march along the intersection of the state and costate hypcrsurfaces. Each 
step in such a process requires the solution of two large scale problems, namely, the discretized 
PDEs. Since in many applications the number of unknowns in either the state or costate equations 
is significantly larger than that in the design equation, marching on the hypersurface defined by 
the design equation is a much less expensive process than that of marching along the state and the 
costate hypcrsurfaces. This is the main idea of the new approach. 

Each step in the minimization algorithms presented here improve the solution of the state and 
costate equations, for example, by improving the distance to the hypcrsurfaces defined by the state 
and costate equations. In addition each step is such that the approximate solution lies on the design 
hypersu rface. 

The construction of algorithms that march along the design hypersurface and converge to the 
minimum of the opt imization problem can be done for a wide class of problems governed by PDE. 
The approach taken here is to look at iterative methods for the solution of the state and costate 
equations as a stable approximation to the evolution equations governed by the constrained PDE. 
The construction of the method is done in two steps. In the first the stationary PDE (the necessary 
conditions) is embedded into an evolution PDE for which the solution evolves in the design hyper- 
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surface, and an energy estimate ensuring convergence is derived. The second step involves a stable 
and consistent discretization of the pseudo-time dependent problem which is usually straightfor- 
ward. 

A technical difficulty which needs some explanation is related to the problem of staying on the 
design hypersurface. Assume that we are given an iterative method for calculating the solutions 
of the state and costate equations. Let the change produced in <f> and A be </> and A respectively. 
In order to remain on the design hypersurface it is necessary to calculate a change in u, namely, u 
such that 


A{u + Uj <f> -t- A + A) — 0 

(10) 

An approximation to this equation is 

dA _ dA j 

du d<f> d\ 

(11) 


T hi s representation is useful when is an invertible operator. Note that the solution of this 
equation involves a system whose size is identical to that of the number of design variables, which 
is significantly smaller than that of the state or the costate equations. While the operator §£ + 
^<{> u + fjA u is invertible, it is not convenient to work with; however A u = §£ is simple and easy 
to manipulate. 

In practice, A u may not be invertible and the update of the it requires a different process. In 
problems arising from partial differential equations in which the design variables are defined on 
the boundary only, the design equation is an additional boundary condition for the system, for the 
extra unknowns, namely, the design variables. In that case per each iteration step of the method 
presented here require the simultaneous solution of the three boundary conditions for the state 
equation, the costate equation and the design equation. These three conditions together involve 
only a fraction more work than that of the boundary conditions for the state and costate equations. 
In cases where the set of the three boundary conditions cannot be solved for the boundary state, 
costate, and design variables, one should include equations from the interior. This is a typical case 
when considering systems of partial differential equations in several dimensions. 

In case that the linearized operator is coercive and the design equation can be solved for the 
design variables, keeping the state and costate variables fixed, one can view the method presented 
here as an approximation to the following time dependent problem 


j t <f>+L(u,<f>) = 0 

(12) 

j t \ + Ll\ + Et = 0 

(13) 

L u A + E u = 0 

(14) 


where the last equation is essentially an extra boundary condition for the design variables. 
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4 Examples 


In this section we show a few examples of using the idea outlined in the previous section. We prove 
an energy estimate for each of the examples, ensuring convergence. 

Example I: Distributed Control Let ft C M n and consider the optimization problem 


min ^ / (<f> — 4>* ) 2 dx 4- -<x f u 2 
“ 2 Jn 2 Jn 

subject to 

J A <f> = u ft 

\ 4 > = 0 9ft 

The necessary conditions are 

= u ft 

AA + <f> = 4>* ft 

< uu — A = 0 ft 

<j> = 0 9ft 

A = 0 9ft 

Consider the pseudo-time embedding 

= A (j> - u ft 

^A = AA + (f>- 4>* ft 
au - A = 0 ft 

<f> = 0 9ft 

A = 0 9ft 


(15) 


(16) 


( 17 ) 


(18) 


We show that the error term in <f>, A, u, tend to zero as t approaches infinity. These error terms 
satisfy the same equations as their corresponding quantities (f>,\,u but with zero source terms. So 
without loss of generality we consider our problem with 4>* = 0. 

The proof uses Poincare’s inequality in the form 

||V’|| 2 <C(||V^|| 2 + (^d 5 ) 2 ) C>0 (19) 

where T C 9ft, and C > 0 is a constant independent of ^ € H l (Cl). The norm used above and in 
the rest of the paper is the L 2 norm on ft. The use of this theorem will be for functions vanishing 
on part of the boundary denoted by T. 

For this example we take T = 9ft since the boundary condition for the errors in both <f> and 
A is zero on 9ft. Multiplying the first two equations in (17) by (f> and A respectively we get using 
integration by parts and the Poincare inequality 


~(<r IMI 2 + i|A|| 2 ) = -crllV^II 2 - ||VA || 2 < -C(a||d >|| 2 + ||A|| 2 ) (20) 

2 at 
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for some constant C > 0, independent of <£, A. 

This implies that a||</>|| 2 + ||A|| 2 decay exponentially with rate exp(-Ct). That is, the pseudo- 
time embedding converges to the minimum, at a rate determined by the constant C . 

Example II: Boundary Control 

The next example is of a boundary control. Let f l C iR 77 , U T2 = d£l, Ti fl T2 = 0 and 
consider 




( 21 ) 


subject to 


r a<^> = 0 ft 
1 - u r, 

I dn ~ U 11 

l 4 > = 0 r 2 


The necessary conditions are 


' A</> = 0 

fl 

AA = 0 

fl 

d / = u 

on 

on 


Ti 

au + A = 0 

Ti 

^ = 0 

r 2 

, A = 0 

r 2 


The time embedding used for this problem is 

&<t> = &4> 

= AA 
H _ 


= u 

d\ 


dn 

-!* + * = *• 
au + A = 0 

<j> = 0 

A = 0 


n 

Q 

on 
r 1 

h 

r 2 

r 2 


( 22 ) 


(23) 


(24) 


In this case the use of Poincare’s inequality is done for T = IV Similarly to the previous 
example it can be shown that 

—MMl 2 + Pll 2 ) = -^IIV^II 2 - fl V A|| 2 < -C{<T\\(j>\\ 2 4- I1AI1 2 ) (25) 

z at 

with a different constant than that of example I. Again this estimate implies the exponential decay 
of errors. Thus, convergence of <j> and A is ensured, and therefore also of u from the Neumann 
boundary condition for <f>. 
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Example III: System of First Order 

Let x e M d , A\,...,Ad be symmetric constant matrices, <j> = .. .,<£„) defined on fl C lR d . 

We introduce the decomposition of <j)\dn — (<f>+, <fro, 4>~) 35 follows. Let A = (Ai, . . . , Ad) and n be 
the outward normal to the boundary d£l. The matrix A • n is symmetric and therefore has real 
eigenvalues and a complete set of eigenvectors. Let <j) = (<£_, <f>o,<t>+) be a decomposition into the 
direct sum of the subspaces of A - n corresponding to negative, zero and positive eigenvalues. For 
simplicity we also assume that A • n has zero eigenvalues on isolated points on the boundary dfl. 
Consider the following problem 

min ^ / (<f>- - g) 2 ds + f u 2 ds (26) 

u 2 Jr 1 2 Jr 1 

where <j> is the solution of 

= o n 

(A-n )+</>+ = u T 1 (27) 

(A • n)+<£ + = 0 r 2 

where U T 2 = cKl , Ti nr 2 = 0 We further assume that there exist a constant K > 0 such that if 
<f>_ = 0, A + = 0 for a time interval larger than Ii then <t> = A = 0. 

The necessary conditions for the above optimization problem are 

A — n Q 

_y'd a . JU - n O 

2^j=i ^Jdxj-v “ 

(A ■ n)+<f>+ = u Ti 

(A • n)_A_ + r)<t> = rjg T x (28) 

A_|_ + crrju = 0 T i 

(A • n )+<p+ = 0 r 2 

(A • n)_A_ = 0 T 2 

where rj is an arbitrary positive number. We use it to derive the convergence estimate. Consider 
the following time embedding 




(A-n )+<f> + = u Ti 

(A-n)_A -+r}<j>=T)g T i 
A + + ( 77]U = 0 r 1 

(A • n ) + <t> + = 0 r 2 

(A • n)_A_ = 0 r 2 


(29) 


For analysis of the behavior of the errors we take g = 0 and using integration by parts we obtain 


^(ll^ll 2 + Pll 2 ) = < (A • n)+*+,<4 > + < (A • «)_</>_,<£_ > - 
< (A • n) + A + , A + > - < (A • n)_A_, A_ > 


(30) 
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where the norms denote i 2 norms on and < > denotes inner products on the boundary dSl. 

Eliminating the u from the boundary condition we obtain the following boundary condition that 
must be satisfied by <j), A 


(A-n)_A_ + #_ = 0 IT (31) 

A + + 77<j(A • n) _ V + = 0 Ti (32) 

Substituting these into the energy estimate we obtain 

H(H\\ 2 + ll A ll 2 ) = < (A-n)_ -(A-n):V-,^- >n - 

< (A • n) + — j(A ■ n) + 1 A + ,A + > Pj (33) 

< (A-n )_<£_,<£_ > ?2 - < (A • n) + A + , A + > Fj 

The conditions 

(A • n)_ - t? 2 (A • n)! 1 < 0 
(A • n)+ - -^p-(A • n) + > 0 

imply that the changes in energy are non-positive. That is, it is either decreasing or stabilized. 
Stabilization of the energy can occur only for the value zero, since otherwise it means that there 
exists a non zero solution to the evolution equation such that <j > _ and A+ are zero for all times. 
This is in contradiction to the assumption about the constraint PDE. 

Since T} was arbitrary in this analysis, we can choose it small enough so that the first condition 
holds. Then if a is large enough the second condition will hold as well. Thus, we obtain convergence 
if a is not too small. 


5 Numerical Results 

In this section we demonstrate the effectiveness of the methods discussed here for an optimization 
problem governed by inviscid incompressible flow. Let Q = {(x,t/)|0 <x<l,0<j/<l), 

r x = {(x, o)|o < x < i} , r 2 = {(x, i)|o < x < i} 

The minimization problem is given by 


min \ [ (<f> — <t>o) 2 dx + -r] f u 2 dx (35) 

“ 2Jri ^ ^ri 

subject to 

&4> — 0 

t = “ Tl < 36 ) 

<£ = 5 (x) r 2 

and all functions are assumed to be periodic in the x direction. 
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5.1 Finite Dimensional Design Space 

We assume that u has the form 




j~ 1 


( 37 ) 


where Qj are constants to be determined and fj(x ) are prescribed functions. The necessary con- 
ditions for this problem are easily derived and we use the following time embedding as a solution 
process 


P - AA = 

94- - „ 

A = 0 
<f> = 9 

fo 0 )dx + rjoij = 0 


ft 

ft 

Ti 

r i 
r 2 


j = 1 


(38) 


This time dependent process was approximated by Jacobi relaxation, where at each time step all 
boundary conditions are satisfied. Residuals history is given in fig 2 and shows that the convergence 
rate is independent of the number of design variables. 


5.2 Infinite Dimensional Design Space 

In this case we look for u in a function space, namely, £ 2 ( 0, 1). The necessary conditions are 
stationary solution of the following time evolution equation which was used in the computation. 


£ t <f)-A4> = o n 

f t \ - AA = 0 ft 

= “ r i 

4> = g(x ) Ti 

+ ri 

A(r,0) -I- rju(x) = 0 Tj 

a = o r 2 


(39) 


This time dependent process was approximated by Jacobi relaxation, where at each time step all 
boundary conditions are satisfied. Residuals history is given in fig 3 and shows that the convergence 
rate is independent of rj. It can be seen from that figure that the number of iteration for the analysis 
prpblem and for the full optimization problem are not much different. 
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Figure 2: Residuals History for analysis and full optimization method, grid 32x32. T] = 0. 








Figure 3: Residuals History for analysis and full optimization method, grid 32x32 

6 Conclusions 

In this paper we have introduced pseudo-time methods for the efficient solution of optimization 
problems governed by partial differential equations. In these methods the marching toward the 
solution of the optimization problem is done on the design hypersurface rather than the intersection 
of the hypersurfaces for state and costate equations. Very efficient solvers are obtained as indicated 
from the proofs as well as from the numerical examples included. The methods allow the solution of 
the full optimization problem in a computational cost similar to that of solving the constrained PDE. 
The methods do not require gradient calculation however, it is essential to use it with the adjoint 
equations. The methods offer an alternative to gradient descent methods. Their implementation is 
straightforward and can be done using multigrid algorithms or single grid iteration. 
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